NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Non-Parametric Neuro-Adaptive Formation Control

https://doi.org/10.1109/TASE.2025.3528501

Verginis, Christos K; Xu, Zhe; Topcu, Ufuk (January 2025, IEEE Transactions on Automation Science and Engineering)

Full Text Available
Robust and Safe Task-Driven Planning and Navigation for Heterogeneous Multi-Robot Teams with Uncertain Dynamics

https://doi.org/10.1109/IROS58592.2024.10802695

Pan, Tianyang; Verginis, Christos K; Kavraki, Lydia E (October 2024, IEEE)

Full Text Available
Joint learning of reward machines and policies in environments with partially known semantics

https://doi.org/10.1016/j.artint.2024.104146

Verginis, Christos K; Koprulu, Cevahir; Chinchali, Sandeep; Topcu, Ufuk (August 2024, Artificial Intelligence)

We study the problem of reinforcement learning for a task encoded by a reward machine. The task is defined over a set of properties in the environment, called atomic propositions, and represented by Boolean variables. One unrealistic assumption commonly used in the literature is that the truth values of these propositions are accurately known. In real situations, however, these truth values are uncertain since they come from sensors that suffer from imperfections. At the same time, reward machines can be difficult to model explicitly, especially when they encode complicated tasks. We develop a reinforcement-learning algorithm that infers a reward machine that encodes the underlying task while learning how to execute it, despite the uncertainties of the propositions’ truth values. In order to address such uncertainties, the algorithm maintains a probabilistic estimate about the truth value of the atomic propositions; it updates this estimate according to new sensory measurements that arrive from exploration of the environment. Additionally, the algorithm maintains a hypothesis reward machine, which acts as an estimate of the reward machine that encodes the task to be learned. As the agent explores the environment, the algorithm updates the hypothesis reward machine according to the obtained rewards and the estimate of the atomic propositions’ truth value. Finally, the algorithm uses a Q-learning procedure for the states of the hypothesis reward machine to determine an optimal policy that accomplishes the task. We prove that the algorithm successfully infers the reward machine and asymptotically learns a policy that accomplishes the respective task.
more » « less
Full Text Available
Verifiable and Compositional Reinforcement Learning Systems

https://doi.org/10.1609/icaps.v32i1.19849

Neary, Cyrus; Verginis, Christos; Cubuktepe, Murat; Topcu, Ufuk (June 2022, Proceedings of the International Conference on Automated Planning and Scheduling)

We propose a framework for verifiable and compositional reinforcement learning (RL) in which a collection of RL subsystems, each of which learns to accomplish a separate subtask, are composed to achieve an overall task. The framework consists of a high-level model, represented as a parametric Markov decision process (pMDP) which is used to plan and to analyze compositions of subsystems, and of the collection of low-level subsystems themselves. By defining interfaces between the subsystems, the framework enables automatic decompositions of task specifications, e.g., reach a target set of states with a probability of at least 0.95, into individual subtask specifications, i.e. achieve the subsystem's exit conditions with at least some minimum probability, given that its entry conditions are met. This in turn allows for the independent training and testing of the subsystems; if they each learn a policy satisfying the appropriate subtask specification, then their composition is guaranteed to satisfy the overall task specification. Conversely, if the subtask specifications cannot all be satisfied by the learned policies, we present a method, formulated as the problem of finding an optimal set of parameters in the pMDP, to automatically update the subtask specifications to account for the observed shortcomings. The result is an iterative procedure for defining subtask specifications, and for training the subsystems to meet them. As an additional benefit, this procedure allows for particularly challenging or important components of an overall task to be identified automatically, and focused on, during training. Experimental results demonstrate the presented framework's novel capabilities in both discrete and continuous RL settings. A collection of RL subsystems are trained, using proximal policy optimization algorithms, to navigate different portions of a labyrinth environment. A cross-labyrinth task specification is then decomposed into subtask specifications. Challenging portions of the labyrinth are automatically avoided if their corresponding subsystems cannot learn satisfactory policies within allowed training budgets. Unnecessary subsystems are not trained at all. The result is a compositional RL system that efficiently learns to satisfy task specifications.
more » « less
Full Text Available
Assured Learning-Based Optimal Control subject to Timed Temporal Logic Constraints

https://doi.org/10.1109/CDC45484.2021.9683417

Fotiadis, Filippos; Verginis, Christos K.; Vamvoudakis, Kyriakos G.; Topcu, Ufuk (December 2021, 2021 IEEE Conference on Decision and Control)

Full Text Available
Augmenting Control Policies with Motion Planning for Robust and Safe Multi-robot Navigation

https://doi.org/10.1109/IROS45743.2020.9341153

Pan, Tianyang; Verginis, Christos K.; Wells, Andrew M.; Kavraki, Lydia E.; Dimarogonas, Dimos V. (October 2020, IEEE/RSJ International conference on Intelligent Robots and systems (IROS))
null (Ed.)
This work proposes a novel method of incorporating calls to a motion planner inside a potential field control policy for safe multi-robot navigation with uncertain dynamics. The proposed framework can handle more general scenes than the control policy and has low computational costs. Our work is robust to uncertain dynamics and quickly finds high-quality paths in scenarios generated from real-world floor plans. In the proposed approach, we attempt to follow the control policy as much as possible, and use calls to the motion planner to escape local minima. Trajectories returned from the motion planner are followed using a path-following controller guaranteeing robustness. We demonstrate the utility of our approach with experiments based on floor plans gathered from real buildings.
more » « less
Full Text Available

Search for: All records